Generation

Generation#

The generation module provides functionality for text generation using language models with distortion-guided beam search. It extends the capabilities of standard beam search by incorporating distortion probabilities and observed sequences.

Key Components#

token_transformation_to_probs: Transforms observed sequences into token indices and probabilities.
get_distortion_probs: Computes distortion probabilities for a batch of observed sequences.
distortion_probs_to_cuda: Transfers distortion probabilities to a CUDA tensor.
distortion_guided_beam_search: Implements the main beam search algorithm with distortion guidance.

API Documentation#

lmcsc.generation.token_transformation_to_probs(self, observed_sequence: str) → Tuple[List[int], List[float], dict][source]#

Transforms an observed sequence into token indices and their corresponding probabilities.

Parameters:

observed_sequence (str) – The input sequence to be transformed.

Returns:

A tuple containing:

List of token indices.
List of corresponding probabilities.
Dictionary of original token lengths.

Return type:

Tuple[List[int], List[float], dict]

lmcsc.generation.get_distortion_probs(self, batch_observed_sequences: List[List[str]], eos_token_id: int) → Tuple[List[int], List[int], List[int], List[float], List[List[dict]], List[bool]][source]#

Computes distortion probabilities for a batch of observed sequences.

Parameters:

batch_observed_sequences (List[List[str]]) – A batch of observed sequences.
eos_token_id (int) – The end-of-sequence token ID.

Returns:

A tuple containing:

List of batch indices.
List of beam indices.
List of token indices.
List of distortion probabilities.
List of original token lengths for each beam.
List of boolean values indicating if EOS is forced.

Return type:

Tuple[List[int], List[int], List[int], List[float], List[List[dict]], List[bool]]

lmcsc.generation.distortion_guided_beam_search(self, observed_sequence_generator: BaseObversationGenerator, beam_scorer: BeamScorer, input_ids: LongTensor = None, logits_processor: LogitsProcessorList | None = None, stopping_criteria: StoppingCriteriaList | None = None, max_length: int | None = None, pad_token_id: int | None = None, eos_token_id: int | List[int] | None = None, output_attentions: bool | None = None, output_hidden_states: bool | None = None, output_scores: bool | None = None, return_dict_in_generate: bool | None = None, synced_gpus: bool = False, streamer: BaseStreamer | None = None, **model_kwargs) → GenerateBeamDecoderOnlyOutput | GenerateBeamEncoderDecoderOutput | LongTensor[source]#

A modified beam search function for CSC.

Notes

This code is based on the beam_search function in the transformers library. We make 5 modifications to the original code:

Initialization.

Intervention of decoding process.

Update the observed sequences.

Remove stopping_criteria.

Put the generated results into Streamer.

You can search ## Modification X.* in the code to find the corresponding part.

Parameters:

observed_sequence_generator (BaseObversationGenerator) – An instance of [BaseObversationGenerator] that defines how observed sequences are generated.
beam_scorer (BeamScorer) – An derived instance of [BeamScorer] that defines how beam hypotheses are constructed, stored and sorted during generation. For more information, the documentation of [BeamScorer] should be read.
input_ids (torch.LongTensor of shape (batch_size, sequence_length), optional) – The sequence used as a prompt for the generation.
logits_processor (LogitsProcessorList, optional) – An instance of [LogitsProcessorList]. List of instances of class derived from [LogitsProcessor] used to modify the prediction scores of the language modeling head applied at each generation step.
stopping_criteria (StoppingCriteriaList, optional) – An instance of [StoppingCriteriaList]. List of instances of class derived from [StoppingCriteria] used to tell if the generation loop should stop.
max_length (int, optional, defaults to 20) – DEPRECATED. Use logits_processor or stopping_criteria directly to cap the number of generated tokens. The maximum length of the sequence to be generated.
pad_token_id (int, optional) – The id of the padding token.
eos_token_id (Union[int, List[int]], optional) – The id of the end-of-sequence token. Optionally, use a list to set multiple end-of-sequence tokens.
output_attentions (bool, optional, defaults to False) – Whether or not to return the attentions tensors of all attention layers. See attentions under returned tensors for more details.
output_hidden_states (bool, optional, defaults to False) – Whether or not to return the hidden states of all layers. See hidden_states under returned tensors for more details.
output_scores (bool, optional, defaults to False) – Whether or not to return the prediction scores. See scores under returned tensors for more details.
return_dict_in_generate (bool, optional, defaults to False) – Whether or not to return a [~utils.ModelOutput] instead of a plain tuple.
synced_gpus (bool, optional, defaults to False) – Whether to continue running the while loop until max_length (needed for ZeRO stage 3)
model_kwargs – Additional model specific kwargs will be forwarded to the forward function of the model. If model is an encoder-decoder model the kwargs should include encoder_outputs.

Returns:

[generation.GenerateBeamDecoderOnlyOutput], [~generation.GenerateBeamEncoderDecoderOutput] or torch.LongTensor: A torch.LongTensor containing the generated tokens (default behaviour) or a [~generation.GenerateBeamDecoderOnlyOutput] if model.config.is_encoder_decoder=False and return_dict_in_generate=True or a [~generation.GenerateBeamEncoderDecoderOutput] if model.config.is_encoder_decoder=True.

lmcsc.generation.process_reward_beam_search(self, observed_sequence_generator: BaseObversationGenerator, prompted_model: AutoModelForCausalLM, beam_scorer: BeamScorer, input_ids: LongTensor = None, prompted_input_ids: LongTensor = None, prompted_model_kwargs: dict = None, logits_processor: LogitsProcessorList | None = None, stopping_criteria: StoppingCriteriaList | None = None, max_length: int | None = None, pad_token_id: int | None = None, eos_token_id: int | List[int] | None = None, output_attentions: bool | None = None, output_hidden_states: bool | None = None, output_scores: bool | None = None, return_dict_in_generate: bool | None = None, synced_gpus: bool = False, streamer: BaseStreamer | None = None, **model_kwargs) → GenerateBeamDecoderOnlyOutput | GenerateBeamEncoderDecoderOutput | LongTensor[source]#

A modified beam search function for CSC.

Notes

This code is based on the beam_search function in the transformers library. We make 5 modifications to the original code:

Initialization.

Intervention of decoding process.

Update the observed sequences.

Remove stopping_criteria.

Put the generated results into Streamer.

You can search ## Modification X.* in the code to find the corresponding part.

Parameters:

observed_sequence_generator (BaseObversationGenerator) – An instance of [BaseObversationGenerator] that defines how observed sequences are generated.
beam_scorer (BeamScorer) – An derived instance of [BeamScorer] that defines how beam hypotheses are constructed, stored and sorted during generation. For more information, the documentation of [BeamScorer] should be read.
input_ids (torch.LongTensor of shape (batch_size, sequence_length), optional) – The sequence used as a prompt for the generation.
logits_processor (LogitsProcessorList, optional) – An instance of [LogitsProcessorList]. List of instances of class derived from [LogitsProcessor] used to modify the prediction scores of the language modeling head applied at each generation step.
stopping_criteria (StoppingCriteriaList, optional) – An instance of [StoppingCriteriaList]. List of instances of class derived from [StoppingCriteria] used to tell if the generation loop should stop.
max_length (int, optional, defaults to 20) – DEPRECATED. Use logits_processor or stopping_criteria directly to cap the number of generated tokens. The maximum length of the sequence to be generated.
pad_token_id (int, optional) – The id of the padding token.
eos_token_id (Union[int, List[int]], optional) – The id of the end-of-sequence token. Optionally, use a list to set multiple end-of-sequence tokens.
output_attentions (bool, optional, defaults to False) – Whether or not to return the attentions tensors of all attention layers. See attentions under returned tensors for more details.
output_hidden_states (bool, optional, defaults to False) – Whether or not to return the hidden states of all layers. See hidden_states under returned tensors for more details.
output_scores (bool, optional, defaults to False) – Whether or not to return the prediction scores. See scores under returned tensors for more details.
return_dict_in_generate (bool, optional, defaults to False) – Whether or not to return a [~utils.ModelOutput] instead of a plain tuple.
synced_gpus (bool, optional, defaults to False) – Whether to continue running the while loop until max_length (needed for ZeRO stage 3)
model_kwargs – Additional model specific kwargs will be forwarded to the forward function of the model. If model is an encoder-decoder model the kwargs should include encoder_outputs.

Returns:

[generation.GenerateBeamDecoderOnlyOutput], [~generation.GenerateBeamEncoderDecoderOutput] or torch.LongTensor: A torch.LongTensor containing the generated tokens (default behaviour) or a [~generation.GenerateBeamDecoderOnlyOutput] if model.config.is_encoder_decoder=False and return_dict_in_generate=True or a [~generation.GenerateBeamEncoderDecoderOutput] if model.config.is_encoder_decoder=True.

Generation

Contents

Generation#

Key Components#

API Documentation#